APVQ encoder applied to wideband speech coding
نویسندگان
چکیده
This paper describes a coding scheme for broadband speech (sampling frequency 16KHz). We present a wideband speech encoder called APVQ (Adaptive Predictive Vector Quantization). It combines Subband Coding, Vector Quantization and Adaptive Prediction as it is represented in Fig.1. Speech signal is split in 16 subbands by means of a QMF filter bank and so every subband is 500Hz wide. This APVQ encoder can be seen as a vectorial extension of a conventional ADPCM encoder. In this scheme, signal vector is formed with one sample of the normalized prediction error signal coming from different subbands and then it is vector quantized. Prediction error signal is normalized by its gain and normalized prediction error signal is the input of the VQ and therefore an adaptive Gain-Shape VQ is considered. This APVQ Encoder combines the advantages of Scalar Prediction and those of Vector Quantization. We evaluate wideband speech coding in the range from 1.5 to 2 bits/sample, that leads to a coding rate from 24 to 32 kbps. 1. BASIC APVQ CODING STRUCTURE APVQ encoder combines several techniques: Subband Coding, adaptive Vector Quantization and adaptive backward Linear Prediction, as it is depicted in Fig.1. Input signal x(n) is a broadband speech signal (0-8kHz) that has been sampled with a Frequency Sampling Fs=16kHz. This speech signal is passed through a symmetric four-stage QMF (Quadrature Mirror Filter Bank) Structure where full-band speech signal is split in 16 different subband signals. Let xi(n) be the speech subband signal in the i-th subband. Every subband signal xi(n) is a 500Hz-wide signal and it has been decimated by 16. To remove redundancy in every subband signal, an adaptive backward scalar linear prediction is introduced: predicted subband signal is subtracted from subband signal xi(n) , yielding a prediction error signal ei(n). As it is shown in Fig.1.a, only first 10 subbands take advantage of a backward predictor. Prediction Gain in the remaining subbands is about 0dB and so backward linear predictor may be discarded in them and their computational complexity can be saved. In these subbands quantization error overcomes 'whiteness' ability of time prediction. It must be born in mind that subband division already implies a kind of frequency 'whiteness'. Because of its low energy content, even 15th and 16th subband signals may be eliminated during transmission without any subjective quality loss. Therefore we evaluate transmission quality of a 7kHz-wide speech signal split in 14 subband signals. APVQ encoder can be seen as a vectorial extension of a conventional ADPCM encoder. In this scheme, signal vector is formed with one sample of the normalized prediction error signal di(n) coming from different subbands and then it is vector quantized. Prediction error signal ei(n) is normalized by its gain and normalized prediction error signal di(n) is the input of the VQ and therefore an adaptive Gain-Shape VQ is considered. This APVQ Encoder combines the advantages of Scalar Prediction and those of Vector Quantization because all of previous samples of speech subband signal xi(n) are available in the subband signal predictor. We handle the high vector dimensionality by using a MultiVQ because of the high computational complexity of Vector Quantization. Multi-VQ technique splits every signal vector in several signal subvectors to obtain an acceptable computational complexity. But Multi-VQ structure implies the need of an intelligent bit assignment in the vector quantization of every signal subvector. The number of subvectors and their lengths are discussed later in this paper for every coding rate: 24, 26, 28 and 32 kbps. We consider two possible techniques to perform an adequate bit assignment: first technique considers fixed length subvectors and a dynamic bit assignment among them; second one considers subvectors with similar gain, adaptive lengths and a uniform bit assignment among them. Both techniques are based on Backward estimation of the subband gain and therefore no side-information is needed because these values are available in the encoder and decoder sides. Furthermore, subjective quality of speech signal is enhanced by means of a spectral weighting of noise signal. When first technique of bit assignment is taken, some different codebooks have to be designed for every subvector. Because of its computational complexity, codebook size has been limited to a maximum value of 1024 codevectors, i.e., a maximum assignment of 10 bits per subvector has been allowed. On the other hand, backward structure force us to consider a minimum assignment of 3 or 4 bits per subvector to avoid a performance loss during several vectors. Therefore, every subvector leads to the design of some different codebooks, whose size is ranging from 8 to 1024 codevectors and subvector length defines the codebook dimension. APVQ decoder scheme has been depicted in Fig.1.b. Received codewords provide different codevectors corresponding to different quantized normalized prediction error signal subvectors. Gain estimation of every subvector sample allows the reconstruction of prediction error subband signal ei(n) corresponding to a specific i-th subband in the receiver side. Moreover, reconstructed subband signal xi(n) is obtained by adding predicted subband signal to the prediction error subband signal ei(n). It must be noted that both gain and predicted subband signal estimations are available in the
منابع مشابه
Tree Encoding for the ITU-T G.711.1 Speech Coder
This paper examines enhancement to ITU-T Recommendation G.711.1 PCM wideband extension speech coder. To further improve the core lower-band coding performance the use of vector quantization and delayed decision coding is studied. A particular case of delayed decision coding, tree encoding, is implemented in the above standard. The bitstream is compatible with both the legacy G.711 and the G.711...
متن کاملRealtime implementation of high-quality 32 kbps wideband LD-CELP coder
The Wideband-Audio Low-Delay CELP (LD-CELP) coder produces speech with quality as high as the CCITT 64 kb/s standard (G.722) at half the bitrate. The computational load of the encoder is almost 900% processor time of the 12.5 MIPS DSP32c. This makes a real-time implementation impractical. We investigated the Gain-Shape Vector-Quantization (GSVQ) in order to reduce the computational load of the ...
متن کاملVariable Bit Rate Cont Diagram Approx
In this paper, we present a variable bit rate control method for speech/audio coding, under the constraint that the total bit rate of a super-frame to be a constant. The proposed method uses a trellis diagram for optimizing the overall quality of the super-frame. In order to reduce the computational complexity, the trellis diagram uses approximation by ignoring the encoder memory state between ...
متن کاملPacket Loss Concealment for G.722 using Side Information with Application to Voice over Wireless LANs
The G.722 wideband speech codec offers higher quality and better naturalness than G.711, is low in complexity, has low delay, and tandems well with other codecs. This makes it an attractive codec for Voice over IP and Voice over Wireless LANs. However, loss of a G.722 coded speech frame results in a mismatch of the encoder/decoder states that affects the decoding of subsequent correctly receive...
متن کاملLow bit rate wideband WI speech coding
This paper investigates Waveform Interpolation (WI) applied low bit rate wideband speech coding. An analysis of the evolutionary behaviour of wideband Characteristic Waveforms (CWs) shows that direct application of the classical WI algorithm may not be appropriate for wideband speech. We propose a modification whereby CW quantisation is performed using classical WI decomposition for the low fre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996